Sparse high-dimensional linear regression. Estimating squared error and a phase transition
نویسندگان
چکیده
We consider a sparse high-dimensional regression model where the goal is to recover k-sparse unknown binary vector β∗ from n noisy linear observations of form Y=Xβ∗+W∈Rn X∈Rn×p has i.i.d. N(0,1) entries and W∈Rn N(0,σ2) entries. In high signal-to-noise ratio regime sublinear sparsity regime, while order sample size needed information-theoretically known be n∗:=2klogp/log(k/σ2+1), no polynomial-time algorithm succeed unless n>nalg:=(2k+σ2)logp. this work, we offer series results investigating multiple computational statistical aspects recovery task in n∈[n∗,nalg]. First, establish novel information-theoretic property MLE problem happening around n=n∗ samples, which coin as an “all-or-nothing behavior”: when n>n∗ it recovers almost perfectly support β∗, if n0 nCnalg disappears. geometric “disconnectivity” property, initially appeared theory spin glasses suggest algorithmic occurs. Finally, using certain technical obtained transition, additionally various positive negative interest, including failure LASSO with access success simple local search method samples.
منابع مشابه
High Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transtition
We consider a sparse linear regression model Y = Xβ∗ + W where X is n × p matrix Gaussian i.i.d. entries, W is n× 1 noise vector with i.i.d. mean zero Gaussian entries and standard deviation σ, and β∗ is p × 1 binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error m...
متن کاملNearly Optimal Minimax Estimator for High Dimensional Sparse Linear Regression
We present estimators for a well studied statistical estimation problem: the estimation for the linear regression model with soft sparsity constraints (`q constraint with 0 < q ≤ 1) in the high-dimensional setting. We first present a family of estimators, called the projected nearest neighbor estimator and show, by using results from Convex Geometry, that such estimator is within a logarithmic ...
متن کاملRobust Estimation in Linear Regression with Molticollinearity and Sparse Models
One of the factors affecting the statistical analysis of the data is the presence of outliers. The methods which are not affected by the outliers are called robust methods. Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers. Besides outliers, the linear dependency of regressor variables, which is called multicollinearity...
متن کاملRobust High-Dimensional Linear Regression
The effectiveness of supervised learning techniques has made them ubiquitous in research and practice. In high-dimensional settings, supervised learning commonly relies on dimensionality reduction to improve performance and identify the most important factors in predicting outcomes. However, the economic importance of learning has made it a natural target for adversarial manipulation of trainin...
متن کاملEstimating a Bounded Normal Mean Relative to Squared Error Loss Function
Let be a random sample from a normal distribution with unknown mean and known variance The usual estimator of the mean, i.e., sample mean is the maximum likelihood estimator which under squared error loss function is minimax and admissible estimator. In many practical situations, is known in advance to lie in an interval, say for some In this case, the maximum likelihood estimator...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Annals of Statistics
سال: 2022
ISSN: ['0090-5364', '2168-8966']
DOI: https://doi.org/10.1214/21-aos2130